-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run compute-sanitizer in nightly build #9641
Run compute-sanitizer in nightly build #9641
Conversation
Seems like this should not be going into |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9641 +/- ##
================================================
- Coverage 10.49% 10.47% -0.02%
================================================
Files 119 119
Lines 20305 20343 +38
================================================
Hits 2130 2130
- Misses 18175 18213 +38
Continue to review full report at Codecov.
|
rerun tests |
Preferably merge to 21.12 so that memchecks could run in nightly builds for previous branches as well. |
The concern is even the faint possibly of breaking the build at a time when the build is critical (and build team over engaged) in the release cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nitpicks.
Bigger picture: I was wondering why we're converting to the junit
format.
testcase_name = "" | ||
else: | ||
pass | ||
# raise Exception('unexpected line in compute-sanitizer log: '+line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If none of the above, should the tool pass?
I think I can see why we'd leave this comment in. Might be useful to uncomment for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The log may contain stdout from googletest or pytest which tool will ignore (pytest is not memchecked yet, but will also be too large to add, probably not nightly, may be weekly).
This is added for debug for analyzing only memcheck log. (using --log-file path
)
so that Jenkins can display it similar to how gtest results are shown. Jenkins Junit plugin expects the results in XML. Besides, Whichever is most convenient method to review memcheck results can be used. Is there any preferred way of report for compute-sanitizer results? (should not limited to memcheck but also leakcheck, racecheck, initcheck too). |
Thank you for the reviews @mythrocks and @davidwendt |
…E value - tested locally)
Tested these environment variables in local docker image after simplifying the PR https://github.com/rapidsai/gpuci-scripts/pull/675 |
rerun tests |
rerun tests |
1 similar comment
rerun tests |
rerun tests |
1 similar comment
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
rerun tests |
@gpucibot merge |
While working on #9641 I noticed that building the iterator gtests takes alot of time in CI. Here is a link to the individual build times for libcudf including the gtests: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/CUDA=11.5,GPU_LABEL=driver-495,LINUX_VER=ubuntu20.04,PYTHON=3.8/5173/testReport/(root)/BuildTime/ (you can sort by Duration by clicking on table colum header). Here is a table of the top 20 compile time offenders as recorded on my local machine. Note that like the CI build output, 6 of the top 20 are just building the `ITERATOR_TEST` | rank | time (ms) | file | | ---:| ---:|:--- | | 1 | 814334 | /cudf.dir/src/search/search.cu.o | 2 | 755375 | /cudf.dir/src/sort/sort_column.cu.o | 3 | 686235 | /ITERATOR_TEST.dir/iterator/optional_iterator_test_numeric.cu.o | 4 | 670587 | /cudf.dir/src/groupby/sort/group_nunique.cu.o | 5 | 585524 | /cudf.dir/src/reductions/scan/scan_inclusive.cu.o | 6 | 582677 | /ITERATOR_TEST.dir/iterator/pair_iterator_test_numeric.cu.o | 7 | 568418 | /ITERATOR_TEST.dir/iterator/scalar_iterator_test.cu.o | 8 | 563196 | /cudf.dir/src/sort/sort.cu.o | 9 | 548816 | /ITERATOR_TEST.dir/iterator/value_iterator_test_numeric.cu.o | 10 | 535315 | /cudf.dir/src/groupby/sort/sort_helper.cu.o | 11 | 531384 | /cudf.dir/src/sort/is_sorted.cu.o | 12 | 530382 | /ITERATOR_TEST.dir/iterator/value_iterator_test_chrono.cu.o | 13 | 525187 | /cudf.dir/src/join/semi_join.cu.o | 14 | 523726 | /cudf.dir/src/rolling/rolling.cu.o | 15 | 517909 | /cudf.dir/src/reductions/product.cu.o | 16 | 513119 | /cudf.dir/src/stream_compaction/distinct_count.cu.o | 17 | 512569 | /ITERATOR_TEST.dir/iterator/optional_iterator_test_chrono.cu.o | 18 | 508978 | /cudf.dir/src/reductions/sum_of_squares.cu.o | 19 | 508460 | /cudf.dir/src/lists/drop_list_duplicates.cu.o | 20 | 505247 | /cudf.dir/src/reductions/sum.cu.o I made some simple changes to the iterator code logic to use different thrust functions along with a temporary device vector. This approach improved the compile time of the `ITERATOR_TEST` by about 3x. Here are the results of compiling the above 6 files with the changes in this PR. | new rank | new time (ms) | file | | ---:| ---:|:--- | | 59 | 232691 (2.9x) | optional_iterator_test_numeric.cu.o | | 26 | 416951 (1.4x) | pair_iterator_test_numeric.cu.o | | 92 | 165947 (3.4x) | scalar_iterator_test.cu.o | | 65 | 216364 (2.5x) | value_iterator_test_numeric.cu.o | | 77 | 186583 (2.8x) | value_iterator_test_chrono.cu.o | | 111 | 137789 (3.7x) | optional_iterator_test_chrono.cu.o | Total overall build time improved locally by ~3m (10%) using `ninja -j48 install` on a Dell 5820. Here are the build time results of a CI build with these changes. https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/CUDA=11.5,GPU_LABEL=driver-495,LINUX_VER=ubuntu20.04,PYTHON=3.8/5190/testReport/(root)/BuildTime/ Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) URL: #9788
continue | ||
fi | ||
echo "Running GoogleTest $test_name" | ||
${COMPUTE_SANITIZER_CMD} ${gt} | tee "$WORKSPACE/test-results/${test_name}.cs.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this have been the following:
${COMPUTE_SANITIZER_CMD} ${gt} --rmm_mode=cuda | tee "$WORKSPACE/test-results/${test_name}.cs.log"
And then you would not need to set and unset the GTEST_CUDF_RMM_MODE
environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use either one for google tests. Initially we wanted to use ctest and cdash to run and report memcheck results. It was not possible to add arguments only for memcheck in ctest. So, we went with environmental variable. Since we want to report to Jenkins, we didn't use ctest/cdash
Hoping to use the environmental variable for both gtests and pytests. (variable name might change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the cuda async mode for RMM. That will help the tests run faster by using the CUDA pool but still have memcheck support.
Addresses part of #904
compute-sanitizer --tool memcheck
on libcudf unit tests when envCOMPUTE_SANITIZER_ENABLE=true
This env
COMPUTE_SANITIZER_ENABLE
will be enabled only in nightly builds of cudf. (To be Enabled in PR https://github.com/rapidsai/gpuci-scripts/pull/675)Reports only failures. If no errors, no tests are reported under memcheck results.
Note: Only
memcheck
is enabled now. when required, other checks of compute-sanitizer could be enabled later.